Take-home Exercise 1 Part 2

Data Visualisation Makeover

Author

Seng Jing Yi

Published

May 3, 2024

Modified

May 8, 2024

1. Overview

1.1 Objective:

This exercise seeks to improve the original visualisation with reference to Ben Jones’ article “A Tale of Four Quadrants”.

1.2 Key elements for a good piece of visualisation:

Clarity Aesthetics
  • 2D columns facilitates easy gauge.

  • Y axis starts from 0 to prevent misleading comparison

  • Call-outs with image and text aid recognition.

  • Relevant title and lead-in verbiage provide context for readers

  • Readable font, relevant to context of data

  • Inclusion of image, relevant to subject

  • Good choice of color throughout

  • Soft grid lines to prevent distraction

  • Elements are well aligned and spaced

1.3 Recap of initial data-set

The transaction data of REALIS is downloaded from URA to cover private residential property transactions from 1st January 2023 to 31st March 2024.

Key fields include:

Column Header Description Value Range
Sale Date Date of transaction 1 Jan 23 - 31 Mar 24
Property Type Property category related to build type Executive Condo, Condo, Apartment, Terrace, Semi-Detached, Detached House
Transacted Price Price agreed between the purchaser and the vendor for property and land. It excludes stamp duties, agency fees, legal and other professional fees. SGD 440,000 to SGD 392,180,000
Unit Price ($ PSF) Cost of property per square feet (generally provided by developers) SGD 138 to SGD 5,756

2. Visualisation selected for make-over

2.1 Context of original visualisation

The combined plot of (1) “Unit Price ($ PSF) across Property Types” and (2) “Transacted Price ($) across Property Types” aim to portray the narrative that “Higher Median Transaction Prices does not imply Higher Median Unit Price ($ PSF) due to other factors such as:

  1. Type of property - not equitable to compare apple to oranges.
  2. Density distribution of prices varies across property type.

Insights derived from Take Home Assignment 1

Here are some findings w.r.t PSFand Transaction Prices across all the Property Types:

  • Finding1: Both the Interquartile Range (IQR) and density distribution is the narrowest for Executive Condominiums, closely followed by Apartments, Condominiums, Semi-Detached, and Terrace Houses. It is the widest for Detached Houses.

    • What this means is that: The spread or variability of the PSF is the least for Executive Condominium, followed by Apartments, Condominiums, Semi-Detached, and Terrace Houses. In other words, the prices for these property types tend to cluster more closely around the median, as indicated by their smaller Interquartile Ranges (IQRs).On the other hand, Detached Houses have the largest IQR, indicating that the prices for Detached Houses are more spread out across a wider range. This means that there can be significant variability in prices for Detached Houses, with some properties priced much higher or lower than the median.
  • Finding2: The Median is the lowest for Detached Houses, followed by Semi-Detached Houses, Executive Condominiums, and Terrace Houses. Condominiums and Apartments have the highest median PSF. However, upon comparing this with the distribution plot in Chart II, it becomes apparent that Property Types such as Apartments, Condominiums, and Executive Condominiums also generally have lower median Transacted Pricesdespite having higher PSF.

    • What this means is that: Comparing PSF between dissimilar property types, such as a condo and a landed property, may not provide an accurate sense of value and affordability. When comparing properties for value, PSF is best used between two similar property type or better, within the same project or block.(Chong, 2020)

2.2 Critique

2.2.1 - Clarity

No Area of Concern Potential confusion
1 Misaligned title of 2 separate charts of different font sizes Plot II on Transacted Price seems like a subplot of Plot I on Unit Price ($PSF) when the 2 plots should be equal comparable to convey the message of non-linear correlation.
2 Plot labels of “I” and “II” is not obvious with missing plot boundaries With plot II’s label intersecting with plot I’s grid, it seems that both “I” and “II” are annotations on plot “I”, but does not contribute much meaningful insights as redundant labels.
3 Misaligned y axis makes it difficult to compare apple-to-apple for various Property Type. With different plot size and un-level y-axis for Plot I and Plot II, viewers are subjected to transposition error when attempting to compare across Property Type.
4 Small dot plot size makes visualising of raw data bins less meaningful as the position of dots seems to contribute to a similar linear line for “Apartment, Condo, Terrace”.

The purpose of rain cloud plots is to visualise raw data with dot plot, distribution of data with half-violin plot and key summary statistics with box-plot.

However, key statistic annotations were not included on box plot, and the dot plot size is too tiny to see distribution of underlying data.

5 Plot II’s x-axis on Transacted Price ($) does not have values displayed. Lack of labels on Transacted Price ($) x-axis for Plot II makes it difficult for viewers to understand the magnitude of the price distribution.
Warning

Comparing Unit Price ($PSF) and Transacted Price ($) directly can be potentially misleading with unit size (Area (SQFT)) being another key influencing factor. For instance, while some detached houses have the lowest Unit ($PSF) it is not the most affordable due to its large unit sizes making total price much higher and less affordable.

2.2.2 - Aesthetic

No Area of Interest Potential turn-off
1

Different font size for Plot I and Plot II charts

Sub-title is a tongue twister and is over-generalised.

Smaller title size of Plot II may lead to viewers de-priortizing it’s importance.

Readability of subtitle can be improved such as including key statistics to capture reader’s attention and annotations to support the narrative of the disproportionate relationship between transacted price and unit price across property types.

2 Plot I’s color scheme is lacking contrast with grey plot with grey background With grey plot on grey background, this gives the impression of a plain chart and does not capture viewer’s attention. Moreover, the characteristics of property type are not captured within the chart (e.g., High rise vs landed)
3 Plot II’s ridges are opaque (dark purple) and overlapping clusters at the left for most Property Type Viewers are unable to identify the height of the sales distribution and with the density plot being based on each Property Type, the relative units sold is not included within the plot. (e.g., Apartment & Condo having much more sale than Terrace, but height is similar)
4 Plot I lack of grid lines to reference Unit Price ($PSF). While grid lines are enforced in Plot I’s underlying R code, the contrast is not obvious, especially for reference to Unit Price ($PSF) values that is of interest to viewers.

2.3 Key Areas for Improvement with Proposed Sketch

Original Visualisation Recommendation Reasoning

X: Unit Price ($PSF)

Y: Property Type

Flipping axis to X: Property Type and Y: Unit Price ($PSF).

Alignment of 2 plots to similar size for comparison.

As property type is a categorical variable, placing it on x-axis enables easier comparison where viewers can scan across property types easily.

More importantly, “Unit Price ($PSF) is a continuous variable and y-axis enables easier scaling when combining 2 charts together.

Order of property type based on Alphabetical order from low to high

Re-ordering property type axis variables for easier reference to comparables based on property type characteristics and arrange by median Unit Price ($PSF).

Introducing colors to group property of similar nature and making the chart more visually appealing.

Property types can be group into 2 clusters.

  1. High Rise Residential (EC, Condo, Apartment)
  2. Landed Properties (Terrace, Semi-D, Detached Houses)

This reordering helps viewers easily compare asset classes of similar nature, and reordering by median price makes it visually appealing to see the increase over asset class.

Grid lines and color of background is not contrasting for viewers. Include soft grid lines at Unit Price ($PSF) intervals to guide users when reading the Y-axis and enforce “0” line. As the plot spreads across 6 property type, it is difficult to quickly identify the “Unit Price ($PSF) with viewer’s eyeballing. Grid lines will help guide more accurate.

3 Revamping

3.1 Preparation

  1. Loading relevant library and raw data file with read_csv.

  2. Pre-processing of data to narrow scope of visualisation to Individual Purchases (No of Units Sold = 1) and Tenure of 99 years.

Code
# Loading packages
pacman::p_load(tidyverse,ggrepel, patchwork, hrbrthemes, forcats,ggdist, ggridges,
               ggthemes,lubridate, knitr,ggiraph, plotly, gganimate, cowplot, grid, gridExtra)

# Import data
realis_1 <- read_csv("data/ResidentialTransaction20240308160536.csv")
realis_2 <- read_csv("data/ResidentialTransaction20240308160736.csv")
realis_3 <- read_csv("data/ResidentialTransaction20240308161009.csv")
realis_4 <- read_csv("data/ResidentialTransaction20240308161109.csv")
realis_5 <- read_csv("data/ResidentialTransaction20240414220633.csv")

combined_data <- rbind(realis_1, realis_2, realis_3, realis_4, realis_5)

# Following the same data processing steps
## Dropping records where Area (SQM) is not null
combined_data <- combined_data %>% 
                  filter(!is.na(`Area (SQM)`))

## Converting sale date to date format
combined_data <- combined_data %>%
  mutate(`Sale Date` = as.Date(`Sale Date`, format = "%d %b %Y"))

## Removing duplicates - There are no dupicates. 
duplicates <- combined_data[duplicated(combined_data$Address), ]
filtered_data <- combined_data[combined_data$Address %in% duplicates$Address, ] 

## Including conditions to focus visualisation on individual sales and Tenure of 99 years. 

# Define the new labels for each category

combined_data = combined_data %>%
  mutate(`Property Type` = recode(`Property Type`, 
                        "Apartment" = "Apartment",
                        "Condominium" = "Condo",
                        "Executive Condominium" = "Executive Condos",
                        "Terrace House" = "Terrace",
                        "Semi-Detached House" = "Semi- Detached",
                        "Detached House" = "Detached"))

# Filter Conditions
resale_data <- combined_data %>%
  filter(`Number of Units` == 1)%>%
  filter(grepl("99 yrs", Tenure, ignore.case = TRUE))

3.2 Revising Plot I - Distribution of Unit Price ($ PSF) across Property Types

  1. Adding new category of prop_clust to label property type into “Landed” and “High Rise”.

  2. Enforcing factoring to reorder the Property Type to “Executive Condos”,“Apartment”, “Condo”, “Terrace”, “Semi- Detached”,“Detached”.

  3. Introducing fixed color scheme of blue for “High Rise” and orange for “Landed”

  4. Calculating key statistics such as “Median” for subsequent annotation.

Code
## Segmenting into 2 clusters of property type and use facet_grid()

resale_data <- resale_data %>% 
  mutate(prop_clust = case_when(
  `Property Type` %in% c("Detached", "Semi- Detached", "Terrace") ~ "Landed", 
  `Property Type` %in% c("Condo", "Apartment", "Executive Condos") ~ "High rise"
))

## Introducing factors to reorder the records
prop_order <- c("Executive Condos","Apartment", "Condo", "Terrace", "Semi- Detached","Detached")

resale_data$`Property Type` <- factor(resale_data$`Property Type`, levels = prop_order)



## Introducing fill color to segment the different property types
prop_col <- c("Apartment" = "#CCE5FF", 
              "Executive Condos" = "#99FFFF", 
              "Condo" = "#99CCff", 
              "Terrace" = "#FFE5CC", 
              "Semi- Detached" = "#FFCC99", 
              "Detached" = "#FF9933")


median_prices <- resale_data %>%
  group_by(`Property Type`) %>%
  summarize(Median_Price = median(`Unit Price ($ PSF)`))

# Adjusting column header for kable display
resale_data <- resale_data %>% 
  rename("Transacted_Price" = "Transacted Price ($)") %>% 
  rename("Area_SQFT" = "Area (SQFT)") %>%
  rename("Unit_Price_PSF" = "Unit Price ($ PSF)") %>% 
  rename("Number_of_Units" = "Number of Units") %>% 
  rename("Project_Name" = "Project Name")

# Define the plot
plot1 <- ggplot(resale_data, aes(x = `Unit_Price_PSF`,
                                 y = `Property Type`, 
                                 fill = `Property Type`)) +
  geom_rect(
    aes(ymin = 0.5, ymax = 3.475, xmin = 0, xmax = Inf), 
    fill = NA, color = "#0080FF", linewidth = 1, linetype = "solid") +
  geom_rect(
    aes(ymin = 3.525, ymax = 6.5, xmin = 0, xmax = Inf), 
    fill = NA, color = "#FF8000", linewidth = 1, linetype = "solid") + 
  # Color settings
  scale_fill_manual(values = prop_col) +
  scale_y_discrete(labels = function(y) str_wrap(y, width = 5)) +
  
  # Geometries
  stat_halfeye(
    adjust = 0.5, justification = -0.2, .width = 1, point_colour = NA) +
  geom_boxplot(
    width = 0.2, outlier.shape = NA) +
  stat_dots(
    side = "left", justification = 1.2, binwidth = 25, dotsize = 0.02) +
  geom_text(data = median_prices, 
            aes(x = Median_Price, y = `Property Type`, 
                label = paste0("$", Median_Price)),
            hjust = -0.4, vjust = 0, size = 2) +
   annotate("text", x = Inf, y = 2, label = "High Rise", vjust = 1.5, fontface = "bold", color = "#0080FF", size = 3) +
  annotate("text", x = Inf, y = 5, label = "Landed", vjust = 1.5, fontface = "bold", color = "#FF8000", size = 3) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, NA)) +
  
  # Coordinate and theme settings
  coord_flip() +
  theme_ipsum(axis_title_size = 10, base_size = 7, grid = "Y") +
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 10, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8),
        axis.title.y = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#FFFFFF", colour = "#f5f5f5"),
        panel.border = element_blank(), 
        legend.position = "top", 
        legend.justification = "center", 
        legend.direction = "horizontal", 
        ) +
  guides(fill = guide_legend(nrow = 1)) +
  
  # Labels and annotations
  labs(x = "Unit Price ($ PSF)", y = NULL,
       title = "Distribution of Unit Price ($ PSF) across Property Types") 

# Display the plot
plot1

Code
# Plotting for high rise
high_rise_1 <- subset(resale_data,  resale_data$`prop_clust` == "High rise")
landed_1 <- subset(resale_data,  resale_data$`prop_clust` == "Landed")

# Recalculating median price for hr
median_prices_hr <- high_rise_1 %>%
  group_by(`Property Type`) %>%
  summarize(Median_Price = median(`Unit_Price_PSF`))

# Plot for high rise only
plot1a <- plot1 <- ggplot(high_rise_1, aes(x = `Unit_Price_PSF`,
                                 y = `Property Type`, 
                                 fill = `Property Type`)) +
  # Color settings
  scale_fill_manual(values = prop_col) +
  scale_y_discrete(labels = function(y) str_wrap(y, width = 5)) +
  
  # Geometries
  stat_halfeye(
    adjust = 0.5, justification = -0.2, .width = 1, point_colour = NA) +
  geom_boxplot(
    width = 0.2, outlier.shape = NA) +
  stat_dots(
    side = "left", justification = 1.2, binwidth = 25, dotsize = 0.02) +
  geom_text(data = median_prices_hr, 
            aes(x = Median_Price, y = `Property Type`, 
                label = paste0("$", Median_Price)),
            hjust = -0.8 , vjust = 0, size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, NA)) +
  
  # Coordinate and theme settings
  coord_flip() +
  theme_ipsum(base_size = 7, grid = "Y")+ 
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 12, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8, face = "bold"),
        axis.title.y = element_text(hjust = 0.5, size = 8, face = "bold"),
        plot.background = element_rect(fill = "#FFFFFF", colour = "#f5f5f5"),
        panel.border = element_blank(), 
        legend.position = "top", 
        legend.justification = "center", 
        legend.direction = "horizontal"
       ) +
  guides(fill = guide_legend(nrow = 1)) +
  
  # Labels and annotations
  labs(x = "Unit Price ($ PSF)", y = NULL,
       title = "Unit Price Distribution") 

# Display the plot
#plot1a

# Recalculating median price for hr
median_prices_l <- landed_1 %>%
  group_by(`Property Type`) %>%
  summarize(Median_Price = median(`Unit_Price_PSF`))


# Plot for landed only
plot1b <- ggplot(landed_1, aes(x = `Unit_Price_PSF`,
                                 y = `Property Type`, 
                                 fill = `Property Type`)) +
  # Color settings
  scale_fill_manual(values = prop_col) +
  scale_y_discrete(labels = function(y) str_wrap(y, width = 5)) +
  
  # Geometries
  stat_halfeye(
    adjust = 0.5, justification = -0.2, .width = 1, point_colour = NA) +
  geom_boxplot(
    width = 0.2, outlier.shape = NA) +
  stat_dots(
    side = "left", justification = 1.2, binwidth = 25, dotsize = 0.02) +
  geom_text(data = median_prices_l, 
            aes(x = Median_Price, y = `Property Type`, 
                label = paste0("$", Median_Price)),
            hjust = -0.8 , vjust = 0, size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, NA)) +
  
  # Coordinate and theme settings
  coord_flip() +
  theme_ipsum(base_size = 7, grid = "Y")+ 
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 12, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8, face = "bold"),
        axis.title.y = element_text(hjust = 0.5, size = 8, face = "bold"),
        plot.background = element_rect(fill = "#FFFFFF", colour = "#f5f5f5"),
        panel.border = element_blank(), 
        legend.position = "top", 
        legend.justification = "center", 
        legend.direction = "horizontal"
       ) +
  guides(fill = guide_legend(nrow = 1)) +
  
  # Labels and annotations
  labs(x = "Unit Price ($ PSF)", y = NULL,
       title = "Unit Price Distribution") 

# Display the plot
#plot1b

3.3 Revising Plot II - Distribution of Transacted Price

  1. Understanding the key statistics for the 2 property cluster of “High Rise” and “Landed”.
High Rise Transacted_Price Area_SQFT Unit_Price_PSF
Min $440,000 355.2 $286
Q1 (25%) $1,231,120 710.4 $1,343
Median $1,514,000 968.8 $1,676
Mean $1,711,129 999.3 $1,767
Q3 (75%) $1,954,320 1205.6 $2,150
Max $18,800,000 8697.3 $4,504
Landed Transacted_Price Area_SQFT Unit_Price_PSF
Min $780,000 1,344 $252
Q1 (25%) $2,575,000 1,778 $1,027
Median $3,569,000 2,580 $1,389
Mean $4,098,191 2,942 $1,470
Q3 (75%) $4,280,000 3,577 $1,814
Max $36,500,000 29,202 $4,456
  1. Plotting ridge plot for the “High Rise” and “Landed” with the color codes by quantile.
  2. Limiting the range of `Transacted Price ($)` to visualise the majority of the data points. This is justified as the outliers form a statistically small proportion and is removed for better representation of distribution.
Property Cluster Property Type Range of Transacted Price Points Excluded
High Rise EC, Apartment, Condo $ 0 - $ 4.5 Mil 202 out of 18,329 records
(Approx 1.10%)
Landed Terrace, Semi-Detached, Detached $ 0 - $ 30 Mil 1 out of 700 records
(Approx 0.14%)
Code
# Creating subset to draw 2 ridges

high_rise <- subset(resale_data,  resale_data$`prop_clust` == "High rise")
landed <- subset(resale_data,  resale_data$`prop_clust` == "Landed")

# Summary 
#kable(summary(landed[,c("Transacted_Price", "Area_SQFT", "Unit_Price_PSF", #"Number_of_Units")]))
#kable(summary(high_rise[,c("Transacted_Price", "Area_SQFT", "Unit_Price_PSF", #"Number_of_Units")]))

# Creating Annotation
median_Sprices <- high_rise %>%
  group_by(`Property Type`) %>%
  summarize(Median_Price = median(`Transacted_Price`))

median_S2prices <- landed %>%
  group_by(`Property Type`) %>%
  summarize(Median_Price = median(`Transacted_Price`))

#assigning color to quantile
quan_col <- c("Executive Condos.0.25" = "#DAFFFF", 
              "Executive Condos.0.5" = "#BBFFFF", 
              "Executive Condos.0.75" = "#99FFFF",
              "Apartment.0.25" = "#E5F2FF", 
              "Apartment.0.5" = "#CCE5FF",
              "Apartment.0.75" = "#C0DEFF",
              "Condos.0.25" = "#99CCFF", 
              "Condos.0.5" = "#83C1FF", 
              "Condos.0.75" = "#66B2FF")

#
plot2a <- ggplot(high_rise, aes(x = `Transacted_Price`,
                                 y = `Property Type`
                                )) +
  stat_density_ridges(aes(fill = stat(quantile)), 
    quantile_lines = FALSE,
                      calc_ecdf = TRUE, geom = "density_ridges_gradient", 
                      height = 1 , color = NA, position = position_dodge(0.5), alpha = 0.5) +
  
  scale_fill_brewer()  +
  
  # Color settings
  scale_y_discrete(labels = function(y) str_wrap(y, width = 5), expand = expansion(add = 0.1)) +
  
  # Geometries
  geom_boxplot(
    width = 0.15, outlier.shape = NA) +
  stat_dots(
    side = "left", justification = 1.1, binwidth = 50000, dotsize = 0.01) +
  geom_text(data = median_Sprices, 
            aes(x = Median_Price, y = `Property Type`, 
                label = paste0("$", Median_Price/1000, "k")),
            hjust = -0.5, vjust = 0, size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, 4500000)) +
  
  # Coordinate and theme settings
  coord_flip() +
  theme_ipsum(base_size = 7, grid = "Y") +
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 12, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8, face = "bold"),
        axis.title.y = element_text(hjust = 0.5, size = 8, face = "bold"),
        plot.background = element_rect(fill = "#FFFFFF", colour = "#f5f5f5"),
        panel.border = element_blank(), 
        legend.position = "top", 
        legend.justification = "right") +
  
  # Labels and annotations
  labs(title = "Transacted Price Distribution", x = "Transacted Price ($)", y = NULL) 

# Display the plot
# plot2a

# Plotting for landed property only
plot2b <- ggplot(landed, aes(x = `Transacted_Price`,
                                 y = `Property Type`
                                )) +
  stat_density_ridges(aes(fill = stat(quantile)), 
    quantile_lines = FALSE,
                      calc_ecdf = TRUE, geom = "density_ridges_gradient", 
                      height = 1 , color = NA, position = position_dodge(0.5), alpha = 0.5) +
  
  scale_fill_brewer(palette = "Oranges")  +
  
  # Color settings
  scale_y_discrete(labels = function(y) str_wrap(y, width = 5), expand = expansion(add = 0.1)) +
  
  # Geometries
  geom_boxplot(
    width = 0.15, outlier.shape = NA) +
  stat_dots(
    side = "left", justification = 1.1, binwidth = 100000, dotsize = 0.01) +
  geom_text(data = median_S2prices, 
            aes(x = Median_Price, y = `Property Type`, 
                label = paste0("$", Median_Price/1000, "k")),
            hjust = -0.5, vjust = 0, size = 2) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, 30000000)) +
  
  # Coordinate and theme settings
  coord_flip() +
  theme_ipsum(base_size = 7, grid = "Y") +
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 12, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8, face = "bold"),
        axis.title.y = element_text(hjust = 0.5, size = 8, face = "bold"),
        plot.background = element_rect(fill = "#FFFFFF", colour = "#f5f5f5"),
        panel.border = element_blank(), 
        legend.position = "top", 
        legend.justification = "right") +
  
  # Labels and annotations
  labs(x = "Transacted Price ($)", y = NULL, title = "Transacted Price Distribution") 

# Display the plot
#plot2b
Future Improvements
  1. 1 dot = 10 units to minimise number of dots

  2. Attempt to recolor the respective quantiles with the following r code, but unable to relabel the colors for the respective property type.

    Code
    quan_col <- c("Executive Condos.0.25" = "#DAFFFF", 
                  "Executive Condos.0.5" = "#BBFFFF", 
                  "Executive Condos.0.75" = "#99FFFF",
                  "Apartment.0.25" = "#E5F2FF", 
                  "Apartment.0.5" = "#CCE5FF",
                  "Apartment.0.75" = "#C0DEFF",
                  "Condos.0.25" = "#99CCFF", 
                  "Condos.0.5" = "#83C1FF", 
                  "Condos.0.75" = "#66B2FF")
  3. Attempt to minimize overlap for ggridges by:

    1. Adjusting height through stat_density_ridges(height = 0.5)
    2. Introducing transparency with stat_density_ridges(alpha = 0.5)
    3. Including space between y-axis scale_y_discrete(expand = expansion(add = 0.1))

3.4 Combining Plot I and Plot II

For High Rise building,

Code
hr_plot <- grid.arrange(plot1a, plot2a, ncol = 2) 
grid.rect(x = 0.5, y = 0.5, width = 1, height = 1, just = c("center", "center"), 
         gp = gpar(col = "#0080FF", fill = NA, lwd = 2))

Insights from High Rise (Unit Price $PSF vs Transacted Price $)
  1. We observed that “Apartment” typically has the highest price for both Unit Price ($PSF) and Transacted Price ($), followed by “Condo” and lastly “Executive Condos”.
  2. While the unit price of Executive Condo and Apartment is denser on the higher spectrums, the transacted price of these units are
  3. With Transacted Price ($) generally being the function of Unit Price ($PSF) and Area (SQFT) for new launches, working backwards, we can infer that “Apartment” are likely to have smaller unit sizes. (Rough estimation below, further visualization with Area (SQFT) is needed to validate this assumption as figures may be distorted with inflation from resale units)
High Rise Type Median Transacted Price ($) Median Unit Price ($PSF) Estimated Area (SQFT)
Executive Condos $1,400,000 $1,326 1,055
Apartment $1,676,855 $2,120.50 790
Condo $1520,000 $1,631 932

For Landed properties,

Code
l_plot <- grid.arrange(plot1b, plot2b, ncol = 2) 
grid.rect(x = 0.5, y = 0.5, width = 1, height = 1, just = c("center", "center"), 
         gp = gpar(col = "#FF8000", fill = NA, lwd = 2))

Insights from Landed Properties (Unit Price $PSF vs Transacted Price $)
  1. Interestingly, the unit price of “Terrace” is the highest at $1,416 psf, followed by “Semi-Detached” at $1,263 psf and lastly, “Detached” of lowest at $1,067 psf.

  2. Nevertheless, aligned with common perception, the Transacted Price ($) for “Detached” is the highest, with 75% approximately $14 million, and median price of $ 6.3 million. This is largely attributed to unit size (Area (SQFT)) not included in this visualisation.

3.5 Visualising relationship between Unit Price ($PSF) and Transacted Price ($) with scatter plot

To understand the relationship between Unit Price ($PSF) and Transacted Price ($), the most direct method is to use a scatter plot to identify any correlation. However, extremely high Transacted Price ($) due to exceptional purchases makes the visualisation skewed and majority of the Unit Price ($PSF) and Transacted Price ($) are clustered near the median.

Hence, ranking the transaction by percentile will evenly distribute the points within the scatter plot and identify any patterns.

3.5.1 - Having an end goal in mind.

With Tableau, we plotted a scatter chart and a combined interactive view

3.5.2 Overview of further areas for exploration with ggplotly and ggiraph.

  1. Introducing trend line to show the correlation between Unit Price ($PSF) and Transacted Price ($).

  2. Introducing separate views for each property type to de-clutter the scatter plot with ggplotly.

  3. Introducing interactivity such that the viewer can either:

    (a) Select price range of interest and see the list of past projects with ggiraph

    (b) Select project of interest by filtering to see the price range of this project. (Test out in the sketch done in Tableau)

3.5.3 - Introducing separate view of scatter plot with ggplotly

To minimise clutter, ggplotly is used to display different views based on the respective property types.

Code
# Processing data
pct <- resale_data %>%
  mutate(
    Tx_Percentile = percent_rank(Transacted_Price),
    PSF_Percentile = percent_rank(Unit_Price_PSF)
  )

## plot to see across different property
plot6 <- ggplot(data = pct,
                aes(x = PSF_Percentile, 
                    y = Tx_Percentile, 
                    colour = `Property Type`)) +
  scale_color_manual(values = prop_col) +
  labs(title = "Correlation Plot", 
       x = "Percentile PSF", 
       y = "Percentile Transacted Price") + 
  geom_point(aes(frame = `Property Type`)
             , alpha = 0.5) + 
   scale_x_continuous(labels = scales::percent_format(scale = 100), 
                      limits = c(0, 1)) +
  scale_y_continuous(labels = scales::percent_format(scale = 100), 
                     limits = c(0, 1)) + 
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 12, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8, face = "bold"),
        axis.title.y = element_text(hjust = 0.5, size = 8, face = "bold"),
        panel.background = element_rect(fill = "#ffffff"), 
        panel.grid.major = element_line(color = "grey80", size = 0.5), 
    panel.grid.major.x = element_line(color = "grey80", size = 0.5), 
    panel.grid.major.y = element_line(color = "grey80", size = 0.5))
  

ggplotly(plot6)

3.5.4 Further drilling down with tool-tips & including interactivity

To illustrate interactivity across plot, we will be focusing on “Apartment” and other property type results can be illustrated by repeating the set of codes. The steps taken include:

  1. Introduce tooltip of Project Name, Transacted Price ($) and Unit Price ($PSF) into the respective dots by hovering above certain point of interest.
  2. Introduce data_id to highlight the data points belonging to the same Project Name to see the range of Transacted Price ($) and Unit Price ($PSF)
  3. Introduce best fit line with geom_smooth to identify the overall trend between Transacted Price ($) and Unit Price ($PSF)
  4. Combining with violin-plot and box–plot of Transacted Price ($) and Unit Price ($PSF) to visualise the relative distribution and position of this sale.
Code
# Split by subset and add interactive feature to identify project name

pct_apt <- subset(pct, pct$`Property Type` == "Apartment")
pct_c <- subset(pct, pct$`Property Type` == "Condo")
pct_ec <- subset(pct, pct$`Property Type` == "Executive Condos")
pct_t <- subset(pct, pct$`Property Type` == "Terrace")
pct_sd <- subset(pct, pct$`Property Type` == "Semi- Detached")
pct_d <- subset(pct, pct$`Property Type` == "Detached")

# Defining Tooltip
pct_apt$tooltip <- c(paste0("Project: ", pct_apt$Project_Name, 
                            "\n Tx Price: $ ", pct_apt$Transacted_Price, 
                            "\n Unit Price: $ ", pct_apt$Unit_Price_PSF))

# Formatting to resolve error
pct_apt$Project_Name <- gsub("'", "", pct_apt$Project_Name)

# Plotting interactive plot
plot_apt_int <- ggplot(data = pct_apt,
                aes(x = PSF_Percentile, 
                    y = Tx_Percentile, 
                    colour = `Property Type`)) +
  scale_color_manual(values = prop_col) +
  labs(title = "Correlation Plot", 
       x = "Percentile PSF", 
       y = "Percentile Transacted Price") + 
   geom_point_interactive(aes(
     tooltip = pct_apt$tooltip , 
     data_id = pct_apt$Project_Name), 
     alpha = 0.7) + 
  geom_smooth(method = "lm",   
              se = TRUE, level = 0.95, color = "black", linewidth = 0.1, fill = "#6f6f6f") + 
  scale_x_continuous(
    labels = scales::percent_format(accuracy = 1), breaks = seq(0, 1, by = 0.25)) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1), breaks = seq(0, 1, by = 0.25)) + theme(legend.position = "none")

apt_interactive <- girafe(
  ggobj = plot_apt_int,                             
  width_svg = 6,                         
  height_svg = 6 * 0.618,
  options = list(                        
    opts_hover(css = "fill:#202020;"),  
    opts_hover_inv(css = "opacity:0.2;") 
  ))

apt_interactive
Code
# Violin plot for unit price
vplot1_apt <- ggplot(data = pct_apt, aes(x = `Unit_Price_PSF`,
                                 y = `Property Type`)) +
  geom_jitter_interactive(width = 0.2, 
                          color = "#004c99", size = 0.3, alpha = 0.2, 
                          aes(data_id = pct_apt$Project_Name, 
                              tooltip = pct_apt$tooltip)) +
  geom_violin(fill = "#CCE5FF", width = 0.3, alpha = 0.7) + 
  geom_boxplot(width = 0.1, fill = "white", alpha = 0.7) + 
  labs(title = "Violin Plot", x = "Unit Price (PSF)", y = "Property Type")

#vplot1_apt

# Violin plot for transacted price
vplot2_apt <- ggplot(data = pct_apt, aes(x = `Transacted_Price`,
                                 y = `Property Type`)) +
  geom_jitter_interactive(width = 0.2, 
                          color = "#004c99", size = 0.3, alpha = 0.2, 
                          aes(data_id = pct_apt$Project_Name, 
                              tooltip = pct_apt$tooltip)) +
  geom_violin(fill = "#CCE5FF", width = 0.3, alpha = 0.7) + 
  geom_boxplot(width = 0.1, fill = "white", alpha = 0.7) + 
  labs(title = "Violin Plot", x = "Transacted Price", y = "Property Type") + 
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, 4500000)) + coord_flip()

#vplot2_apt

# Combining into single view 
layout <- c(
  area(t=1, l=1, b=8, r=5),
  area(t=1, l=1, b=2, r=8), 
  area(t=3, l=6, b=8, r=8)
)

combined_plot <- girafe(code = print((plot_apt_int + vplot1_apt + vplot2_apt + plot_layout(design = layout))), 
       width_svg = 10,
       height_svg = 10,
       options = list(
         opts_hover(css = "fill: #202020;"),
         opts_hover_inv(css = "opacity:0.2;")
         )
       ) 

combined_plot

4 Reflecting on lessons learnt - “Less is better than more”

During the assignment, the option of combining 3 existing plots was considered. However, looking at the combined plot below, too much information with the rainbow plot makes the information presented to the user less comprehensible.

Hence, to simplify, a violin plot was opted with the individual data points reflected in the background to increase the scale of the complementary 2 plots with the main scatter plot in percentile terms.

Code
plot1_apt <- ggplot(pct_apt, aes(x = `Unit_Price_PSF`,
                                 y = `Property Type`, 
                                 fill = `Property Type`)) +
  # Color settings
  scale_fill_manual(values = prop_col) +
  scale_y_discrete(labels = function(y) str_wrap(y, width = 5)) +
  
  # Geometries
  stat_halfeye(
    adjust = 0.5, justification = -0.2, .width = 1, point_colour = NA) +
  geom_boxplot_interactive(
    width = 0.2, outlier.shape = NA, aes(data_id = pct_apt$Project_Name)) +
  stat_dots(
    side = "left", justification = 1.2, binwidth = 25, dotsize = 0.02) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, NA)) +
  
  # Coordinate and theme settings
  theme_ipsum(axis_title_size = 10, base_size = 7, grid = "Y") +
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 10, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8),
        axis.title.y = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#FFFFFF", colour = "#f5f5f5"),
        panel.border = element_blank()) +
  
  # Labels and annotations
  labs(x = "Unit Price ($ PSF)", y = NULL,
       title = "Distribution of Unit Price ($ PSF) across Property Types",
       subtitle = "Higher Median Transaction Prices do not necessarily imply Higher Median Unit Price ($ PSF)", size = 5) 


plot2_apt <- ggplot(pct_apt, aes(x = `Transacted_Price`,
                                 y = `Property Type`, 
                                 fill = `Property Type`)) +
  # Color settings
  scale_fill_manual(values = prop_col) +
  scale_y_discrete(labels = function(y) str_wrap(y, width = 5)) +
  
  # Geometries
  stat_halfeye(
    adjust = 0.5, justification = -0.2, .width = 1, point_colour = NA) +
  geom_boxplot_interactive(aes(data_id = pct_apt$Project_Name), 
    width = 0.2, outlier.shape = NA) +
  stat_dots(
    side = "left", justification = 1.2, binwidth = 25, dotsize = 0.02) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 5), labels = scales::label_number(scale = 1), limits = c(0, 4500000)) +
  
  
  # Coordinate and theme settings
  coord_flip() +
  theme_ipsum(axis_title_size = 10, base_size = 7, grid = "Y") +
  theme(plot.margin = margin(10, 10, 10, 10),
        plot.title = element_text(size = 10, hjust = 0.3),
        axis.title.x = element_text(hjust = 0.5, size = 8),
        axis.title.y = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#FFFFFF", colour = "#f5f5f5"),
        panel.border = element_blank()) +
  
  # Labels and annotations
  labs(x = "Transacted Price ($)", y = NULL,
       title = "Distribution of Unit Price ($ PSF) across Property Types", size = 5) 

#plot2_apt

#Recording attempt to combine, but ultimately sticking to simplicity to tell the story

test_1 <- girafe(code = print((plot_apt_int + plot1_apt + plot2_apt + plot_layout(design = layout))), 
       width_svg = 10,
       height_svg = 10,
       options = list(
         opts_hover(css = "fill: #202020;"),
         opts_hover_inv(css = "opacity:0.2;")
         )
       ) 

test_1